{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 理解过程"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sigmoid(xs):\n",
    "    return 1 / ( 1 + numpy.exp(-xs))\n",
    "\n",
    "def dsigmoid(xs):\n",
    "    fx = sigmoid(xs)\n",
    "    return fx * (1 - fx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Tensor方程:\n",
    "\n",
    "$$\n",
    "\\begin{align*}\n",
    "y_1 &= w_{11} x_1 + w_{12} x_2 + w_{13} x_3 \\\\\n",
    "y_2 &= w_{21} x_1 + w_{22} x_2 + w_{23} x_3 \\\\\n",
    "a_1 &= sigmoid(y_1) = \\phi(y_1) \\\\\n",
    "a_2 &= sigmoid(y_2) = \\phi(y_2) \\\\\n",
    "l &= \\dfrac{1}{2} (a_1^2 + a_2^2) \\text{标量}\n",
    "\\end{align*}\n",
    "$$\n",
    "\n",
    "独立function求导过程 (vector-valued):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\left (\\dfrac{\\partial{\\overrightarrow{y}}}{\\partial{\\overrightarrow{x}}} \\right )^T = \n",
    "\\begin{pmatrix}\n",
    " \\dfrac{\\partial{y_1}}{\\partial{x_1}} & \\dfrac{\\partial{y_2}}{\\partial{x_1}} \\\\ \n",
    " \\dfrac{\\partial{y_1}}{\\partial{x_2}} & \\dfrac{\\partial{y_2}}{\\partial{x_2}} \\\\ \n",
    " \\dfrac{\\partial{y_1}}{\\partial{x_3}} & \\dfrac{\\partial{y_2}}{\\partial{x_3}} \\\\\n",
    "\\end{pmatrix} = \n",
    "\\begin{pmatrix}\n",
    " w_{11} & w_{21} \\\\ \n",
    " w_{12} & w_{22} \\\\ \n",
    " w_{13} & w_{23} \\\\\n",
    "\\end{pmatrix} \\tag{1}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\left (\\dfrac{\\partial{\\overrightarrow{a}}}{\\partial{\\overrightarrow{y}}} \\right )^T = \n",
    "\\begin{pmatrix}\n",
    " \\dfrac{\\partial{a_1}}{\\partial{y_1}} & \\dfrac{\\partial{a_2}}{\\partial{y_1}} \\\\ \n",
    " \\dfrac{\\partial{a_1}}{\\partial{y_2}} & \\dfrac{\\partial{a_2}}{\\partial{y_2}} \\\\ \n",
    "\\end{pmatrix} = \n",
    "\\begin{pmatrix}\n",
    " \\phi(y_1)(1-\\phi(y_1)) &  0 \\\\ \n",
    " 0 & \\phi(y_2)(1-\\phi(y_2)) \\\\\n",
    "\\end{pmatrix} \\tag {2}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\left (\\dfrac{l}{\\partial{\\overrightarrow{a}}} \\right )^T = \n",
    "\\begin{pmatrix}\n",
    " \\dfrac{\\partial{l}}{\\partial{a_1}} \\\\ \n",
    " \\dfrac{\\partial{l}}{\\partial{a_2}} \\\\ \n",
    "\\end{pmatrix} = \n",
    "\\begin{pmatrix}\n",
    " a_1 \\\\ \n",
    " a_2 \\\\\n",
    "\\end{pmatrix} = \n",
    "\\begin{pmatrix}\n",
    " \\phi(y_1) \\\\ \n",
    " \\phi(y_2) \\\\\n",
    "\\end{pmatrix} \\label{3} \\tag {3}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有(1),(2),(3)得:\n",
    "$$\n",
    "\\begin{align*}\n",
    "\\left (\\dfrac{l}{\\partial{\\overrightarrow{x}}} \\right )^T &= \n",
    "\\left (\\dfrac{\\partial{\\overrightarrow{y}}}{\\partial{\\overrightarrow{x}}} \\right )^T \n",
    "\\left (\\dfrac{\\partial{\\overrightarrow{a}}}{\\partial{\\overrightarrow{y}}} \\right )^T \n",
    "\\left (\\dfrac{l}{\\partial{\\overrightarrow{a}}} \\right )^T =\n",
    "\\begin{pmatrix}\n",
    " w_{11} & w_{21} \\\\ \n",
    " w_{12} & w_{22} \\\\ \n",
    " w_{13} & w_{23} \\\\\n",
    "\\end{pmatrix}\n",
    "\\begin{pmatrix}\n",
    " \\phi(y_1)(1-\\phi(y_1)) &  0 \\\\ \n",
    " 0 & \\phi(y_2)(1-\\phi(y_2)) \\\\\n",
    "\\end{pmatrix}\n",
    "\\begin{pmatrix}\n",
    " \\phi(y_1) \\\\ \n",
    " \\phi(y_2) \\\\\n",
    "\\end{pmatrix} \\\\\n",
    " &=\n",
    "\\begin{pmatrix}\n",
    " w_{11}\\phi^2(y_1)(1-\\phi(y_1)) + w_{21}\\phi^2(y_2)(1-\\phi(y_2)) \\\\ \n",
    " w_{12}\\phi^2(y_1)(1-\\phi(y_1)) + w_{22}\\phi^2(y_2)(1-\\phi(y_2)) \\\\ \n",
    " w_{13}\\phi^2(y_1)(1-\\phi(y_1)) + w_{23}\\phi^2(y_2)(1-\\phi(y_2)) \\\\ \n",
    "\\end{pmatrix}\n",
    "\\end{align*} \\tag{4}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1.],\n",
       "        [2.],\n",
       "        [3.]], requires_grad=True)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xs = torch.tensor([[1], [2], [3]], dtype=torch.float, requires_grad=True)\n",
    "xs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ws = torch.randn(3, 2, requires_grad=True)\n",
    "# ws = torch.tensor([[1, 0.2], [3, 0.4], [5, 0.6]], dtype=torch.float, requires_grad=True)\n",
    "# ws\n",
    "# print(ws.requires_grad)\n",
    "# ws = torch.t(ws) # 调用方法之后requires_grad无效\n",
    "# print(ws.requires_grad) # TODO 虽然输出为True但是不能在backward后输出它的梯度, 所以使用下面的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1.0000, 3.0000, 5.0000],\n",
       "        [0.2000, 0.4000, 0.6000]], requires_grad=True)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ws = torch.tensor([[1, 3, 5], [0.2, 0.4, 0.6]], dtype=torch.float, requires_grad=True)\n",
    "ws"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([2, 3]) torch.Size([3, 1]) torch.Size([2, 1])\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tensor([[22.0000],\n",
       "        [ 2.8000]], grad_fn=<MmBackward>)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ys = torch.mm(ws, xs)\n",
    "print(ws.size(), xs.size(), ys.size())\n",
    "ys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1.0000],\n",
       "        [0.9427]], grad_fn=<MulBackward0>)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a_s = 1 / ( 1 + torch.exp(-ys))\n",
    "a_s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(1.8886, grad_fn=<SumBackward0>)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "l = torch.sum(a_s * a_s)\n",
    "l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "l.backward(retain_graph=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[0.0204],\n",
      "        [0.0408],\n",
      "        [0.0611]])\n",
      "None\n",
      "tensor([[5.5789e-10, 1.1158e-09, 1.6737e-09],\n",
      "        [1.0188e-01, 2.0376e-01, 3.0564e-01]])\n"
     ]
    }
   ],
   "source": [
    "print(xs.grad)\n",
    "print(ys.grad)\n",
    "print(ws.grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.01018808515111608\n",
      "0.020376170581179024\n",
      "0.030564256011241964\n"
     ]
    }
   ],
   "source": [
    "# TODO, 为什么不是xs.grad, 而是ws.grad[1]\n",
    "phi1 = sigmoid(22)\n",
    "phi2 = sigmoid(2.8)\n",
    "\n",
    "print(1*phi1*phi1*(1-phi1) + 0.2*phi2*phi2*(1-phi2))\n",
    "print(3*phi1*phi1*(1-phi1) + 0.4*phi2*phi2*(1-phi2))\n",
    "print(5*phi1*phi1*(1-phi1) + 0.6*phi2*phi2*(1-phi2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 官网例子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[1., 1.],\n",
      "        [1., 1.]], requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "x = torch.ones(2, 2, requires_grad=True)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[1., 2.],\n",
      "        [3., 1.]], requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "x2 = torch.tensor([[1., 2.], [3., 1.]], requires_grad=True)\n",
    "print(x2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[2., 3.],\n",
      "        [4., 2.]], grad_fn=<AddBackward0>)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tensor([[2., 3.],\n",
       "        [4., 2.]], grad_fn=<AddBackward0>)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# y = torch.add(x, x2)\n",
    "y = x + x2\n",
    "print(y)\n",
    "y.requires_grad_(True) #TODO why? 还是不能输出梯度, 必须构造声明 requires_grad=True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[12., 27.],\n",
      "        [48., 12.]], grad_fn=<MulBackward0>) tensor(24.7500, grad_fn=<MeanBackward0>)\n"
     ]
    }
   ],
   "source": [
    "z = y * y * 3\n",
    "out = z.mean()\n",
    "\n",
    "print(z, out)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "out.backward(retain_graph=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True tensor([[3.0000, 4.5000],\n",
      "        [6.0000, 3.0000]])\n"
     ]
    }
   ],
   "source": [
    "print(x.requires_grad, x.grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True None\n"
     ]
    }
   ],
   "source": [
    "print(y.requires_grad, y.grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True tensor([[3.0000, 4.5000],\n",
      "        [6.0000, 3.0000]])\n"
     ]
    }
   ],
   "source": [
    "print(x2.requires_grad, x2.grad)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
